Tree-guided Group Lasso for Multi-response Regression with Structured Sparsity, with an Application to Eqtl

نویسندگان

  • Seyoung Kim
  • Eric P. Xing
چکیده

We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clustering tree with leaf nodes for responses and internal nodes for clusters of related responses at multiple granularity, and we seek to leverage this structure to recover covariates relevant to each hierarchically-defined cluster of responses. We propose tree-guided group lasso, or tree lasso, for estimating such structured sparsity under multi-response regression by employing a novel penalty function constructed from the tree. We describe a systematic weighting scheme for the overlapping groups in the tree-penalty such that each regression coefficient is penalized in a balanced manner despite the inhomogeneous multiplicity of group memberships of the regression coefficients due to overlaps among groups. For efficient optimization, we employ a smoothing proximal gradient method that was originally developed for a general class of structured-sparsity-inducing penalties. Using simulated and yeast datasets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns, compared to other methods for learning a multivariate-response regression.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tree-guided Group Lasso for Multi-response Regression with Structured Sparsity, with an Application to Eqtl Mapping1 by Seyoung Kim

We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clus...

متن کامل

Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping

We consider the problem of learning a sparse multi-task regression with an application to a genetic association mapping problem for discovering genetic markers that influence expression levels of multiple genes jointly. In particular, we consider the case where the structure over the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at...

متن کامل

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity

We consider the problem of learning a sparse multi-task regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each output cluster. Assuming that the tree structure is available as prior knowledge, we formulate this p...

متن کامل

A Two-Graph Guided Multi-task Lasso Approach for eQTL Mapping

Learning a small number of genetic variants associated with multiple complex genetic traits is of practical importance and remains challenging due to the highdimensional nature of data. In this paper, we proposed a two-graph guided multi-task Lasso to address this issue with an emphasis on estimating subnetwork-to-subnetwork associations in expression quantitative trait loci (eQTL) mapping. The...

متن کامل

Structured Input-Output Lasso, with Application to eQTL Mapping, and a Thresholding Algorithm for Fast Estimation

We consider the problem of learning a high-dimensional multi-task regression model, under sparsity constraints induced by presence of grouping structures on the input covariates and on the output predictors. This problem is primarily motivated by expression quantitative trait locus (eQTL) mapping, of which the goal is to discover genetic variations in the genome (inputs) that influence the expr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012